1,044 research outputs found
TrIMS: Transparent and Isolated Model Sharing for Low Latency Deep LearningInference in Function as a Service Environments
Deep neural networks (DNNs) have become core computation components within
low latency Function as a Service (FaaS) prediction pipelines: including image
recognition, object detection, natural language processing, speech synthesis,
and personalized recommendation pipelines. Cloud computing, as the de-facto
backbone of modern computing infrastructure for both enterprise and consumer
applications, has to be able to handle user-defined pipelines of diverse DNN
inference workloads while maintaining isolation and latency guarantees, and
minimizing resource waste. The current solution for guaranteeing isolation
within FaaS is suboptimal -- suffering from "cold start" latency. A major cause
of such inefficiency is the need to move large amount of model data within and
across servers. We propose TrIMS as a novel solution to address these issues.
Our proposed solution consists of a persistent model store across the GPU, CPU,
local storage, and cloud storage hierarchy, an efficient resource management
layer that provides isolation, and a succinct set of application APIs and
container technologies for easy and transparent integration with FaaS, Deep
Learning (DL) frameworks, and user code. We demonstrate our solution by
interfacing TrIMS with the Apache MXNet framework and demonstrate up to 24x
speedup in latency for image classification models and up to 210x speedup for
large models. We achieve up to 8x system throughput improvement.Comment: In Proceedings CLOUD 201
ACC Saturator: Automatic Kernel Optimization for Directive-Based GPU Code
Automatic code optimization is a complex process that typically involves the
application of multiple discrete algorithms that modify the program structure
irreversibly. However, the design of these algorithms is often monolithic, and
they require repetitive implementation to perform similar analyses due to the
lack of cooperation. To address this issue, modern optimization techniques,
such as equality saturation, allow for exhaustive term rewriting at various
levels of inputs, thereby simplifying compiler design.
In this paper, we propose equality saturation to optimize sequential codes
utilized in directive-based programming for GPUs. Our approach simultaneously
realizes less computation, less memory access, and high memory throughput. Our
fully-automated framework constructs single-assignment forms from inputs to be
entirely rewritten while keeping dependencies and extracts optimal cases.
Through practical benchmarks, we demonstrate a significant performance
improvement on several compilers. Furthermore, we highlight the advantages of
computational reordering and emphasize the significance of memory-access order
for modern GPUs
JACC: An OpenACC Runtime Framework with Kernel-Level and Multi-GPU Parallelization
The rapid development in computing technology has paved the way for
directive-based programming models towards a principal role in maintaining
software portability of performance-critical applications. Efforts on such
models involve a least engineering cost for enabling computational acceleration
on multiple architectures while programmers are only required to add meta
information upon sequential code. Optimizations for obtaining the best possible
efficiency, however, are often challenging. The insertions of directives by the
programmer can lead to side-effects that limit the available compiler
optimization possible, which could result in performance degradation. This is
exacerbated when targeting multi-GPU systems, as pragmas do not automatically
adapt to such systems, and require expensive and time consuming code adjustment
by programmers.
This paper introduces JACC, an OpenACC runtime framework which enables the
dynamic extension of OpenACC programs by serving as a transparent layer between
the program and the compiler. We add a versatile code-translation method for
multi-device utilization by which manually-optimized applications can be
distributed automatically while keeping original code structure and
parallelism. We show in some cases nearly linear scaling on the part of kernel
execution with the NVIDIA V100 GPUs. While adaptively using multi-GPUs, the
resulting performance improvements amortize the latency of GPU-to-GPU
communications.Comment: Extended version of a paper to appear in: Proceedings of the 28th
IEEE International Conference on High Performance Computing, Data, and
Analytics (HiPC), December 17-18, 202
OmpSs-2 and OpenACC interoperation
We propose an interoperation mechanism to enable novel composability across pragma-based programming models. We study and propose a clear separation of duties and implement our approach by augmenting the OmpSs-2 programming model, compiler and runtime system to support OmpSs-2 + OpenACC programming. To validate our proposal we port ZPIC, a kinetic plasma simulator, to leverage our hybrid OmpSs-2 + OpenACC implementation. We compare our approach against OpenACC versions of ZPIC on a multi-GPU HPC system. We show that our approach manages to provide automatic asynchronous and multi-GPU execution, removing significant burden from the applicationâs developer, while also being able to outperform manually programmed versions, thanks to a better utilization of the hardware.This work has been part of the EPEEC project. The EPEEC project has received funding from the European Unionâs Horizon 2020 research and innovation programme under grant agreement No 801051. This paper was also partially funded by the Ministerio de Ciencia e InnovaciĂłn Agencia Estatal de InvestigaciĂłn (PID2019-107255GB-C21/AEI/10.13039/501100011033). We gratefully acknowledge the support of NVIDIA AI Technology Center (NVAITC) Europe who provided us the remote access to NVIDIA DGX-1Peer ReviewedPostprint (author's final draft
Measurement of the cosmic ray spectrum above eV using inclined events detected with the Pierre Auger Observatory
A measurement of the cosmic-ray spectrum for energies exceeding
eV is presented, which is based on the analysis of showers
with zenith angles greater than detected with the Pierre Auger
Observatory between 1 January 2004 and 31 December 2013. The measured spectrum
confirms a flux suppression at the highest energies. Above
eV, the "ankle", the flux can be described by a power law with
index followed by
a smooth suppression region. For the energy () at which the
spectral flux has fallen to one-half of its extrapolated value in the absence
of suppression, we find
eV.Comment: Replaced with published version. Added journal reference and DO
Energy Estimation of Cosmic Rays with the Engineering Radio Array of the Pierre Auger Observatory
The Auger Engineering Radio Array (AERA) is part of the Pierre Auger
Observatory and is used to detect the radio emission of cosmic-ray air showers.
These observations are compared to the data of the surface detector stations of
the Observatory, which provide well-calibrated information on the cosmic-ray
energies and arrival directions. The response of the radio stations in the 30
to 80 MHz regime has been thoroughly calibrated to enable the reconstruction of
the incoming electric field. For the latter, the energy deposit per area is
determined from the radio pulses at each observer position and is interpolated
using a two-dimensional function that takes into account signal asymmetries due
to interference between the geomagnetic and charge-excess emission components.
The spatial integral over the signal distribution gives a direct measurement of
the energy transferred from the primary cosmic ray into radio emission in the
AERA frequency range. We measure 15.8 MeV of radiation energy for a 1 EeV air
shower arriving perpendicularly to the geomagnetic field. This radiation energy
-- corrected for geometrical effects -- is used as a cosmic-ray energy
estimator. Performing an absolute energy calibration against the
surface-detector information, we observe that this radio-energy estimator
scales quadratically with the cosmic-ray energy as expected for coherent
emission. We find an energy resolution of the radio reconstruction of 22% for
the data set and 17% for a high-quality subset containing only events with at
least five radio stations with signal.Comment: Replaced with published version. Added journal reference and DO
Measurement of the Radiation Energy in the Radio Signal of Extensive Air Showers as a Universal Estimator of Cosmic-Ray Energy
We measure the energy emitted by extensive air showers in the form of radio
emission in the frequency range from 30 to 80 MHz. Exploiting the accurate
energy scale of the Pierre Auger Observatory, we obtain a radiation energy of
15.8 \pm 0.7 (stat) \pm 6.7 (sys) MeV for cosmic rays with an energy of 1 EeV
arriving perpendicularly to a geomagnetic field of 0.24 G, scaling
quadratically with the cosmic-ray energy. A comparison with predictions from
state-of-the-art first-principle calculations shows agreement with our
measurement. The radiation energy provides direct access to the calorimetric
energy in the electromagnetic cascade of extensive air showers. Comparison with
our result thus allows the direct calibration of any cosmic-ray radio detector
against the well-established energy scale of the Pierre Auger Observatory.Comment: Replaced with published version. Added journal reference and DOI.
Supplemental material in the ancillary file
The Standard European Vector Architecture (SEVA): a coherent platform for the analysis and deployment of complex prokaryotic phenotypes
The 'Standard European Vector Architecture' database (SEVA-DB, http://seva.cnb.csic.es) was conceived as a user-friendly, web-based resource and a material clone repository to assist in the choice of optimal plasmid vectors for de-constructing and re-constructing complex prokaryotic phenotypes. The SEVA-DB adopts simple design concepts that facilitate the swapping of functional modules and the extension of genome engineering options to microorganisms beyond typical laboratory strains. Under the SEVA standard, every DNA portion of the plasmid vectors is minimized, edited for flaws in their sequence and/or functionality, and endowed with physical connectivity through three inter-segment insulators that are flanked by fixed, rare restriction sites. Such a scaffold enables the exchangeability of multiple origins of replication and diverse antibiotic selection markers to shape a frame for their further combination with a large variety of cargo modules that can be used for varied end-applications. The core collection of constructs that are available at the SEVA-DB has been produced as a starting point for the further expansion of the formatted vector platform. We argue that adoption of the SEVA format can become a shortcut to fill the phenomenal gap between the existing power of DNA synthesis and the actual engineering of predictable and efficacious bacteria
Consensus standards for acquisition, measurement, and reporting of intravascular optical coherence tomography studies
Objectives: The purpose of this document is to make the output of the International Working Group for Intravascular Optical Coherence Tomography (IWG-IVOCT) Standardization and Validation available to medical and scientific communities, through a peer-reviewed publication, in the interest of improving the diagnosis and treatment of patients with atherosclerosis, including coronary artery disease. Background: Intravascular optical coherence tomography (IVOCT) is a catheter-based modality that acquires images at a resolution of âŒ10 ÎŒm, enabling visualization of blood vessel wall microstructure in vivo at an unprecedented level of detail. IVOCT devices are now commercially available worldwide, there is an active user base, and the interest in using this technology is growing. Incorporation of IVOCT in research and daily clinical practice can be facilitated by the development of uniform terminology and consensus-based standards on use of the technology, interpretation of the images, and reporting of IVOCT results. Methods: The IWG-IVOCT, comprising more than 260 academic and industry members from Asia, Europe, and the United States, formed in 2008 and convened on the topic of IVOCT standardization through a series of 9 national and international meetings. Results: Knowledge and recommendations from this group on key areas within the IVOCT field were assembled to generate this consensus document, authored by the Writing Committee, composed of academicians who have participated in meetings and/or writing of the text. Conclusions: This document may be broadly used as a standard reference regarding the current state of the IVOCT imaging modality, intended for researchers and clinicians who use IVOCT and analyze IVOCT data
- âŠ